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ABSTRACT 

In this paper, we formulate a novel problem for finding blackhole and volcano patterns in a large 
directed graph. Specifically, a blackhole pattern is a group which is made of a set of nodes in a way such 
that there are only inlinks to this group from the rest nodes in the graph. In contrast, a volcano pattern is a 
group which only has outlinks to the rest nodes in the graph. Both patterns can be observed in real world. 
For instance, in a trading network, a blackhole pattern may represent a group of traders who are 
manipulating the market. In the paper, we first prove that the blackhole mining problem is a dual problem 
of finding volcanoes. Therefore, we focus on finding the blackhole patterns. Along this line, we design 
two pruning schemes to guide the blackhole finding process. In the first pruning scheme, we strategically 
prune the search space based on a set of pattem-size-independent pruning rules and develop an iBlackhole 
algorithm. The second pruning scheme follows a divide-and-conquer strategy to further exploit the 
pruning results from the first pruning scheme. Indeed, a target directed graphs can be divided into several 
disconnected subgraphs by the first pruning scheme, and thus the blackhole finding can be conducted in 
each disconnected subgraph rather than in a large graph. Based on these two pruning schemes, we also 
develop an iBlackhole-DC algorithm. Finally, experimental results on real-world data show that the 
iBlackhole-DC algorithm can be several orders of magnitude faster than the iBlackhole algorithm, which 
has a huge computational advantage over a brute-force method. 



1, INTRODUCTION 

Financial institutions and govenmient agencies, such as U.S. Securities and Exchange Commission 
(SEC), are facing some daunting challenges in the field of financial fraud detection. The sophistication of 
criminals' tactics makes detecting and preventing fraud difficult, especially as the number of trading 
accounts and the volume of transactions grow dramatically. Indeed, the trading networks are vulnerable to 
these fast -growing accounts and the volume of transactions. Particularly, criminals know iraud detection 
systems are not good at correlating user behavior across multiple trading accounts. This weakness opens 
the door for cross-account collaborative fraud, which is difficult to discover, track and resolve because the 
activities of the fraudsters usually appear to be normal trading activities. For instance, consider a trading 
network with a large number of nodes and directed edges, a trader or a group of traders can perform 
trading only within several accounts for the purpose of manipulating the market. This kind of illegal 
trading activities is widely known as trading ring. 

In this paper, we study a special type of trading-ring patterns, called blackhole and volcano patterns. 
Given a directed graph, a blackhole pattern is a group which is made of a set of nodes in a way such that 
there are only inlinks to this group from the rest nodes in the graph. In contrast, a volcano pattern is a 
group which only has outlinks to the rest nodes in the graph. To the best of our knowledge, this is the first 
time to have the concepts of blackhole and volcano patterns in the directed graphs. In fact, both blackhole 
and volcano patterns can be observed in real-world trading networks. For example, a blackhole pattern 
can represent a group of traders who are manipulating the market by performing transactions on a specific 
stock among themselves for a specific time period. In other words, the overall shares of the target stock in 
their trading accounts can only increase during this time period, while these traders have produced a large 



volume of transactions on this stock. After the stock price goes up to a certain degree, these traders start 
selling off their shares to the public. In this stage, these trading accounts form a volcano pattern which 
only has outlines to the rest public accounts. 

However, the process for finding blackhole and volcano patterns can be computationally prohibited, 
since this is a combinatorial problem in nature. To address this challenge, we first prove that the 
blackhole pattern mining problem is a dual problem of finding volcano patterns. Therefore, we can focus 
on finding the blackhole patterns. Along this line, we design two pruning schemes to guide the blackhole 
pattern mining process. In the first pruning scheme, we identify a set of pattem-size-independent pruning 
rules by studying the structural graph properties of blackhole patterns. These pruning rules can be used 
for pruning the search space no matter the size of the patterns is. Based on the first pruning scheme, we 
design an iBlackhole algorithm for finding blackhole patterns. In contrast, the second pruning scheme 
follows a divide-and-conquer strategy to further exploit the pruning results irom the first pruning scheme. 
Specifically, because a target directed graph have been divided into several disconnected subgraphs by 
the first pruning scheme, it becomes much more efficient to find blackhole patterns in each disconnected 
subgraphs rather than in a large graph. Based on these two pruning schemes, we develop an even more 
effective algorithm, named iBlackhole-DC, for mining blackhole patterns in directed graphs. Furthermore, 
we have provided the proof of the completeness and correctness of both iBlackhole and iBlackhole-DC 
algorithms. 

Finally, experiments results on several real-world directed networks are provided to show the pruning 
effect of two pruning schemes. As shown in the experiments, the iBlackhole algorithm has a huge 
computational advantage over a brute-force approach. Also, the iBlackhole-DC algorithm is several 
orders of magnitude faster than the iBlackhole algorithm. Finally, we show the effectiveness of blackhole 
patterns for finding some interesting stock movement patterns. 



2, PRELIMINARIES 

In this section, we introduce some basic concepts and notations that will be used in this paper. 

First, consider a directed graph G = (V, E) [5], where Fis the set of all nodes and E is the set of all 
edges. Assume that G has no self-loop and has no more than one edge between any pair of nodes. A 
directed edge e in G is denoted as e = (x, y), where x andj are nodes of G and an arc is directed ixom x to 
y. Each edge e has a positive weight, denoted as cOg, associated with this edge. 

Definition 1 (in-weight and out-weight). For a set of nodes B ^ V, and let C = V\ B, the in-weight of 
B is defined as: d- (B) -/ CO^^ . And the definition of the out-weight of B is very similar: 



d^XB)-X « 
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Figure 1 shows an example of the in-weight and out-weight of a set of nodes. The number associated 
with each edge is the weight of that edge. In this figure, the in-weight of ^ is 6 + 5 = 11, while the out- 
weight is 3 + 3 + 1+2 = 9. 

Next, we give the definition of the blackhole in a directed graph as the following. 

Definition 2 (blackhole). Given a directed graph G = (V, E), we say that a set of nodes B ^ V form a 
blackhole, if and only if the following two conditions are satisfied: 1) If \B\ ^ 2, the subgraph G(B) 
induced by B is weakly connected, and 2) di„(B) / do^/B) > 0, where is a pre-specified positive threshold 
and is typically a very large value. 

Finally, we present the definition of the volcano in a directed graph as follows. 



Definition 3 (volcano). Given a directed graph G = (V, E), we say that a set of nodes Vol ^ Fform a 
volcano, if and only if the following two conditions are satisfied: 1) If \B\ ^ 2, the subgraph G(Vol) 
induced by Vol is weakly connected, and 2) dout(Vol) / di„(Vol) > 0, where is a pre-specified positive 
threshold and is typically a very large value. 




Figure 1: Illustration: in-weight and out-weight 



3. PROBLEM FORMULATION 

In this section, we formulate the problems of detecting blackhole and volcano patterns in a directed 
graph. 

3.1 A General Problem Formulation 

Given a directed graph G = (V, E), the goal of detecting blackhole patterns is to find out the blackhole 
set, denoted as Blackhole, such that, 1) for each element B E Blackhole, B ^ V and B satisfies the 
definition of blackhole, and 2) for any other set of nodes C ^ Fand C Blackhole, C does not satisfy the 
blackhole definition. The problem of detecting volcano patterns can be formulated in a similar fashion. 

Next, we show that the problem of detecting blackhole patterns is a dual problem of detecting volcano 
patterns. 

Theorem \. The problem of finding out the blackhole set in a directed graph is a dual problem of 
finding out the volcano set in the same directed graph. 

Proof. Consider a directed graph G = (V, E). Let G' = (V, E') be the inverse graph of G, where all the 
nodes in G' are the same as in G; while for each edge e = (x, y) ^ E, there is an edge e' = (y, x) ^ E', 
and the weight associated with e ' are exactly the same as the weight associated with e. Therefore, the in- 
weight of a set of nodes 5 in G are exactly the same as the out-weight of 5 in G', and vice versa. If 5 is a 
blackhole in G, which means di„(B) / dou/B) > OinG, then in G ', we have dout(B) / di„(B) > 0. Therefore, 
B forms a volcano in G '. As a result, the problem of finding out the blackhole set in G is equivalent to the 
problem of finding out the volcano set in G '. n 

Now that we know the problem of detecting the blackhole set in the original directed graph is 
equivalent to the problem of detecting the volcano set in the inverse graph. Therefore, in the rest of this 
paper, we can only focus on detecting blackhole patterns in a directed graph. 

3.2 A Simplified Problem Formulation 



The above general problem of detecting blackhole patterns is very complex. Instead, in this paper, we 
focus on a more practical version of this problem. Specifically, we exploit two constraints to simplify the 
general problem as follows. 1) The weights associated with all edges are all equal to 1. This constraint 
results that the in-weight of a node becomes the in-degree and the out-weight becomes the out-degree; 2) 
Instead of considering the general version of a blackhole, which satisfies di„(B) / dout(B) > 0, we simplify 
the blackhole definition with doui(B) = 0. 

Figure 2 shows an example of the simplified blackhole patterns. In this figure, there are two blackhole 
patterns, which have been highlighted by dashed circles. 
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Figure 2: An illustration of the simplified blackhole 



4, ALGORITHM DESIGN 

In this section, we introduce the algorithms for detecting blackhole patterns in a directed graph. 

4.1 A Brute-Force Approach 

First, we present a brute -force approach for finding blackhole patterns. As we know, a set of nodes B ^ 
Fis a blackhole, if and only if: 1) If |5| ^2, the subgraph G(B) induced by B is weakly connected, and 2) 
doui(B) = 0. Therefore, the intuition is really simple: all the possible combinations of the nodes in G are 
checked using the exhaustive search method. Also, for each combination B, if the subgraph G(B) induced 
by B is weakly connected and dout(B) = 0, then 5 is a blackhole in the directed graph G. 

In real-world scenarios, it is typically computational prohibited to find all the blackhole patterns, since 
the number of combinations of the nodes is exponentially increased as the number of nodes. A practical 
way is to find blackhole patterns which include only limited number of nodes. Here, we introduce a 
concept of n-node blackholes. 

Definition 4 {n-node blackhole). Given a directed graph G = (V, E), we say that a set of nodes B ^ V 
is an n-node blackhole, if and only if the following two conditions are satisfied: 1) 5 is a blackhole in G, 
and 2) B £ V(n), where V(n) is the set of all possible subsets containing n nodes in V; that is, \B\ = n. 

Figure 3 shows the pseudocode of the brute-force algorithm to detect 1 through n-node blackhole 
patterns in a directed graph G. Since we have considered all the possible combinations of nodes in G iirom 
1 through n, this algorithm is complete. Also, since for each combination of nodes, we have checked 
whether it satisfies the definition of blackhole, this algorithm is correct. 



ALGORITHM BR.UTE-FOR.CE(G = (V;^), ji) 


Input: 






G: the directed graph 




V: the set of all nodes 




E: the set of all edges 




n; max number of nodes each blackhole may contain 


Output: 1 




Blackholc: 1 to n-node blackhole set of G 


1. 


Blackhole ^ III 


2. 


for i .(— 1 to jj do 


3. 


for eacli B G V'(i) do 


4. 


if G{B) is weakly connected then 


5. 


if rfaut(S) == then 


6. 


Blackhole ^ Blackhole \J B 


7. 


end if 


8. 


end if 


9. 


end for 


10. 


end for 


11. 


return Blackhole 



Figure 3: The brute-force algorithm 



4.2 A Scheme of the IBlackhole Algorithm 

In general, finding blackhole patterns in a directed graph is a combinatorial problem. Therefore, as the 
number of nodes n increases, the computation time increases exponentially, making the brute -force 
algorithm unrealistic to obtain the result for a large n value. To this end, we introduce some pattem-size- 
independent pruning rules to reduce the search space. The key idea behind these pruning rules is to find 
out irrelevant nodes that have no chance to form an «-node blackhole as many as possible, and eliminate 
these nodes fi^om the candidate search list. In this way, the search space can be reduced dramatically. The 
algorithm developed based on these pruning rules is named as iBlackhole. 

Figure 4 shows the scheme of the iBlackhole algorithm for detecting 1 through «-node blackhole 
patterns in a directed graph G. In this algorithm, all the blackhole patterns are identified one by one 
according to their number of nodes. In each step of finding the /-node blackhole patterns, a potential list 
Pi is first established. Only the nodes in this potential list have possibilities to form an /-node blackhole. 
In other words, nodes that are not in this list have no chance to be in an /-node blackhole pattern. Then 
nodes in P, will be examined one after another and irrelevant nodes will be deleted based on some 
pruning rules. The results of this pruning form a candidate list C,. For each node v in C„ we will check it 
again and remove irrelevant nodes from C, using some additional pruning rules. Finally, we will have the 
final search list Fj, and then we can apply the brute-force algorithm on P, to find out the /-node blackhole 
patterns. More details about this algorithm will be given after we introduce some pruning rules. 

4.3 Pruning Rules 

In this section, we introduce pruning rules associated with the potential list, the candidate list, and the 
final search list. 

Definition 5 {directed path). Given a directed graph G = (V, E), vg, vi, V2, ■■■ , vt ^ V , ei, e2, ... , e^ ^ 
E, where e, = (vt-i, Vi). We say that the sequence of Vf;eyV/e2V2---eiVi forms a directed path from vq to Vk, if 
V, ^Vj for all ^ i,j ^ k, ii^ j. The length of this directed path is k. 



Definition 6 {reachable). Given a directed graph G = (V, E), u, v 
u if there is a directed path that starts from u and ends at v. 



V. We say that v is reachable from 



Definition 7 {predecessor and successor). Given a directed graph G = (V, E), u, v ^ V. If v is 
reachable from u, then we say u is a predecessor of v, and v is a successor of u. If there is an edge from u 
to V, then m is a direct predecessor of v, and v is a direct successor of m. 



ALGORITHM iBlackhole.(G = (V,E), n) 
Input; 

G: the directed graph 

V\ the set of all nodes 

E: the set of all edges 

n: max number of nodes each blackhole may contain 
Output: 

Blackhole: 1 to n-iiode blackhole set of G 

1. Blackhole ^^ 

2. for i •<— 1 to 71 do 

3. establish potential list Pj 

4. remove irrelevant nodes from P^, get candidate list Ci 

5. remove irrelevant nodes from C'j, get final list F^ 

6. apply the Brute- Force Algorithm on Fj 

7. to find out the i-node blackhole patterns 

8. end for 

9. return Blackhole. 



Figure 4: A scheme of the iBlackhoIe algorithm 



Lemma 1. If a node v ^ B, where B ^Visa blackhole, then all the direct successors of v are all in B. 

Proof. This can be proved by contradiction. Assume that there is at least one of v's direct successors s, 
and s B, then we have doui(B) ^ 1 since e = (v, s) ^ E and s B. This contradicts with the definition of 
blackhole. Therefore, all direct successors of v should be in B. n 

Based on Lemma 1 , we have the following lemma. 

Lemma 2. In an «-node blackhole B, the maximum out-degree of any node in B is n-1. 

Proof. This can be proved by contradiction. Suppose there is a node v with out-degree at least n in an 
«-node blackhole B, then v should have at least n direct successors, denoted as si, S2, ... , s„. According to 
Lemma 1, if v ^ B, all of 5;, S2, ... , 5„ should be inB, which makes the size of this blackhole at least n+1. 
Then we find a contradiction here. Therefore, the maximum out-degree of any node in an «-node 
blackhole should be no greater than n-1. a 

According to Lemma 2, we can derive the following theorem for pruning the potential list Pj. 

Theorem 2. For the potential list P„ only nodes with out-degree less than / need to be considered. 

Proof. By Lemma 2, in an «-node blackhole, the maximum out-degree of any node is n-1. In other 
words, nodes with out-degree greater than i-1 have no chance to be in an i-node blackhole. Therefore, 
only nodes with out-degree less than / needs to be included in the potential list P,. n 

According to Theorem 2, only the nodes with out-degree less than / are used to establish the potential 
list Pj. After having P„ some additional pruning rules can be applied to remove irrelevant nodes from P, to 
get the candidate list C,. 



Lemma 3. For each node v ^P„ if there is at least one of v's direct successors s 0Pi, then v ^C,. 

Proof. This can be proved by contradiction. Since s 0Pi, this means s has no chance to be in an /-node 
blackhole. Assume that finally v belongs to an /-node blackhole B. According to Lemma 1, all v's direct 
successors, which include s, will also belong to B. Then we have a contradiction here. Therefore, v has no 
chance to form an /-node blackhole, and thus v can be removed fromP, safely; that is, v ^C,. n 

After a node is removed irom P„ there are some other nodes associated with it can also be removed 
fi"om Pi. 

Lemma 4. If a node v is removed from P„ then all of its direct predecessors can also be removed from 
Pi. 

Proof. For each of v's direct predecessors p, v is;7's direct successor. Since v has been removed fromP,, 
then V Pi. According to Lemma 3, p should also be removed from P,. Therefore, all v's direct 
predecessors can be removed from P,. n 

By Lemma 4, when removing a node v from P„ all its direct predecessors should also be removed. 
Then, the newly removed direct predecessors become the new "v"s. Finally, the cascading delete will 
spread to all v's predecessors. Figure 5 shows an example of the cascading delete process when removing 
node V from the potential list Pj. The shadow nodes in the figures are nodes removed from P3. In Figure 
(a), s has an out-degree of 3, which makes it exclude from Pj at the first place. When nodes in Pj are 
checked one after another, it can be noticed that v has a successor s not in Pj. Therefore, v is removed 
from P3. Then all of v's direct predecessors are all deleted from P3 as shown in Figure (b), and this 
process spreads to all v's predecessors in Figure (c). 




(a) 



Figure 5: Illustration: the cascading delete process 



Lemma 3 and Lemma 4 are the pruning rules which are used on P, to get the candidate list C,. 
Nonetheless, some of the nodes in P, do not need to be examined and will definitely be in C,. 

Lemma 5. If a node v ^ Cj-i, then v ^ C,. 

Proof. Clearly, C,-/ ^ Pi-i, and P,-y .^P,. Therefore, C,-/ ^ Pf. In other words, if v ^ C,-;, v ^ P,. 
Since all the nodes in C,-/ only point to other nodes in C,-/, all their successors are still in C,-;. Therefore, 
when pruning rules (Lemma 3 and Lemma 4) are applied to P„ there is no chance for v to be removed 
fromP, by these pruning rules. Finally, we know v ^ C,. n 

According to Lemma 5, there is no need to examine nodes in C,_/ when applying pruning rules to P„ 
which makes this step more efficient. 



Before we can continue to introduce the pruning strategies, we would like to introduce another concept 
here. 

Definition 8 (closure). Given a directed graph G = (V, E), v ^ V. The closure of v, denoted as v^, is 
defined as: v^ = {s \ there is a directed path from v to s} U {v}. 

Figure 6 shows an example of the closure of node v. In this figure, v^ = {v, a, c, b, h, g}. 




Figure 6: An example of the closure of node v 

Indeed, the closure of a node has an important feature as the following. 

Theorem 3. The closure of a node v is a blackhole. Furthermore, it is a subset of any blackhole which 
contains v. 

Proof. For the first part of this theorem, by the definition of closure, v^ is the set of all nodes reachable 
fi"om V, together with v. If v^ does not form a blackhole, there have to be at least an edge e = (s, t) ^ G, 
such that s E v^ and t 0v. If s = v, then t is reachable fi'om v, we have t ^ v^;l{ s ^ v, since there is a 
directed path from v to s, and e = (s, t), t can be reached irom v. We can also have t ^ v^. In either 
condition, we can have a contradiction here. Therefore, v^ is a blackhole. 

For the second part of this theorem, if a blackhole B contains v, by Lemma 1, all v's direct successors 
should all be in B. And then these direct successors become the new "v"s. Eventually, this procedure will 
be spread to all the v's successors. The above leads to v^ ^ B. n 

The feature of closure (Theorem 3) can be used to derive some pruning rules to remove some irrelevant 
nodes from C„ and finally lead to the final search list F,. 

Lemma 6. For each node v ^ Ci, if |v | > /, then v and all its predecessors are not in F,. 

Proof. By Theorem 3, v^ is a subset of any blackhole which contains v. Suppose v is in an f-node 
blackhole B. Then we have v^ ^B. So \B\ ^ \v^\ > i. We can have a contradiction here. Therefore, v has 
no chance to form an /-node blackhole, and we can remove v fi'om C, safely. Then, the similar cascading 
delete procedure can be applied, and thus all the v's predecessors can be deleted fi-om C,. Therefore, v and 
all its predecessors will not be in Fj. a 

Lemma 7. For each node v ^ Cj, if |v^| = /, then v^ can be outputted as an /-node blackhole. Also, v 
and all its predecessors can be removed from C,. 



Proof. According to Theorem 3, v^ is a blackhole. Since |v^| = i, v^ can be outputted as an /-node 
blackhole. Assume that v will also be in another blackhole B. By Theorem 3, v^ ^ B. If \B\ > i, B is not an 
/-node blackhole and cannot be outputted as an /-node blackhole; If |5| = /, then B is exactly v^, and has 
already been out putted as an /-node blackhole. In either situation, we can remove v from C,. Then the 
similar cascading delete procedure can be applied, and thus all v's predecessors can be deleted from C,. n 

Lemma 6 and Lemma 7 are used as pruning rules to prune the candidate list C, and get the final search 
list Fj. After having Fi, we can apply the brute-force approach on Fi to find out the /-node blackhole 
patterns. In the next subsection, we will give the details of the iBlackhole algorithm. 

4,4 The iBlackhole Algorithm 

The iBlackhole algorithm exploits the pruning rules stated fi^om Lemma 1 to Lemma 7. Figure 7 shows 
the detailed pseudocode of the iBlackhole algorithm. Specifically, Line 4 establishes the potential list P,. 
Lines 5-14 remove irrelevant nodes irom P„ and get the candidate list C,. Lines 15-26 remove 
irrelevant nodes fi'om C„ and get the final search list P,. Lines 27-33 apply the brute-force approach on 
Fj to find out the /-node blackhole patterns. 



ALGORITHM iBla.ckhole{G = {V^E), n) 


Input: 








G: 


the directed graph 




V: 


the set of all nodes 




E: 


the set of all edges 




n: 


max number of nodes eELch blackhole may contain 


Output: 






Blachholei 1 to n-node blackhole set of G 


1. 


Blackhole ^ 


2. 


Co 


^(S 


3. 


for 


i -4— 1 to n do 


4. 




Pi ■(— {l'\dont(v) < i} 


5. 




for each u in P^ do 


6. 




if I? ^ Ci_i then 


7. 




if at least one of u's directed 


S. 




successors are not in Pj then 


9. 




remove v from P*j 


10. 




remove all u's predecessors from Pj 


11. 




end if 


12. 




end if 


13. 




end for 


14. 




Ci^Pi 


15. 




for each v in Cj do 


16. 




if |k+| > i then 


17. 




remove tj from (J^ 


IS. 




remove all v's predecessors from Ci 


19. 




end if 


20. 




if \v~^\ i then 


21. 




Blackhole J^ Blackhole[Jv+ 


22. 




remove v from Cj 


23. 




remove all v's predecessors from Cj 


24. 




end if 


25. 




end for 


26. 




FiH-Ci 


27. 




for each B G Fi(i) do 


28. 




if G{B) is weakly connected then 


29. 




if dout(B) == then 


30. 




Blackhole ^ Blackhole.\J B 


31. 




end if 


32. 




end if 


33. 




end for 


34. 


end for | 


35. 


ret 


urn Blackhole 



Figure 7: The iBlackhole algorithm 



Completeness and Correctness. In the iBlackhole algorithm, since only the nodes that have no chance 
to form an /-node blackhole pattern are removed in each iteration / (this is guaranteed by Lemma 1 
through Lemma 7). In other words, all the possible combinations of nodes have been checked to produce 
Fi, this algorithm is complete. Also, for each candidate blackhole pattern, since we have checked whether 
this candidate pattern satisfies the definition of blackhole or not, this algorithm is correct. 

Figure 8 shows an example of the procedure of the iBlackhole algorithm when searching the 3 -node 
blackhole patterns. The shadow nodes in the figures are nodes which have been deleted. In Figure (a), s 
has an out-degree of 3, so 5 can be deleted from P} at the first place. In Figure (b), when we check the 
nodes in Pi one after another, we notice that v has a successor s not in P3. Therefore, we can remove v 
from P}. Then, all the direct predecessors of v can be cascaded deleted from P3, and this delete process 
spreads to all v's predecessors. Finally, we have the candidate hst C3 = {a, b, c, i,j, k}. In Figure (c), we 
find that |/^| = 3. Therefore, we output C = {i,j, k} as a 3-node blackhole, and delete / from d. Now, we 
have the final search list F^ = {a, b, c,j, k}. In Figure (d), we examine each 3-combination of nodes in F3, 
and find out a 3-node blackhole {a, b, c}. Therefore, there are two 3-node blackhole patterns in this 
example, {a, b, cj and {i, j, k} respectively. 




(>) 



(b) 




(0 m 

Figure 8: An example of the procedure of the iBlackhole Algorithm 



4.5 The IBlackhole-DC Algorithm 

While the search space has been reduced dramatically in the iBlackhole algorithm, it is still possible to 
develop some pruning strategies for the graphs with some special characteristics. Indeed, for a node v ^ 
V, V can only form a blackhole pattern with nodes within the same weakly connected component in G by 
the blackhole definition. Therefore, if a directed graph has several weakly connected components, which 
are not connected to each other, a divide-and-conquer pruning strategy can be exploited for first 
identifying these weakly connected components and the blackhole finding method can be conducted in 



each weakly connected component. This pruning strategy can drastically divide a large exponential 
growth search space into several much smaller exponential growth search space, and thus reducing a lot 
of computational cost. 

In this paper, we combine the iBlackhole algorithm with this divide-and-conquer pruning strategy and 
develop an even more effective algorithm, named iBlackhole-DC, for finding blackhole patterns. Figure 9 
shows the scheme of this algorithm for finding out 1 through «-node blackhole patterns in a directed 
graph G. 

The completeness and correctness of the iBlackhole-DC algorithm is straightforward. Since the only 
difference between iBlackhole and iBlackhole-DC is the use of the divide-and-conquer strategy. We 
know that the iBlackhole algorithm is complete and correct. Also, the divide-and-conquer strategy only 
separates the nodes which cannot form blackhole patterns. Therefore, the iBlackhole-DC algorithm is also 
complete and correct. 



ALGORITHM iBlackhole- DC{G = (V,E), n) 
Input: 

G: the directed graph 

V: the set of all nodes 

E: the set of all edges 

n: max number of nodes each blackhole may contain 
Output: 

Blackhole: 1 to n-nocle blackhole set of G 
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Blackhole <— 
for i ^ 1 to n do 

establish potential list Pi 

remove useless nodes from F\, get candidate list Ci 
remove useless nodes from C^, get final list F^ 
for each weakly connected component in G{F^) do 
apply the Brute Force Algorithm to 

find out tlie i-item blackhole set {-B} 
Blackhole •(- Blackhole \J{B} 
end for 
end for 
return Blackhole 



Figure 9: A scheme of the iBlackhole-DC algorithm 



5. EXPERIMENTAL RESULTS 

Here, we present the experimental results to evaluate the performances of iBlackhole and iBlackhole- 
DC algorithms. 



5.1 The Experimental Setup 

Experimental Data. The experiments were conducted on four real-world data sets: Wiki, Amazon, 
Roget, and Stock. Table 1 shows some characteristics of these data sets. 



Table 1: Data characteristics 



Data set 


7^ nodes 


# egdes 


Wiki 


7.115 


103,689 


Wiki500 


500 


3,865 


WikilOOO 


1.000 


9,741 


WikilBOO 


1.500 


16.389 


Wikil500-full 


L500 


16,820 


Amazon 


262.111 


1,234,877 


AmazonlOOO 


1.000 


3,952 


AmazonSOO^full 


500 


1,911 


Roget 


1.022 


5,075 


Roget-full 


1,022 


5,127 


Stock-0.35 


2,453 


273 



The Wiki Data Set. There are 7,115 nodes and 103,689 edges in the Wiki data set [14]. To make the 

brute-force algorithm runnable, we derived three subgraphs from the original graph, with the number of 

nodes 500, 1,000, and 1,500 respectively. These subgraphs are named as WikiSOO, WikilOOO, and 

WikilSOO separately. In addition, we synthesized a weakly connected directed graph, named as WikilSOO- 

full, by adding some edges to WikilSOO data set. 

The Amazon Data Set. There are 262,111 nodes and 1,234,877 edges in the Amazon data set [13]. 
Similar to the Wiki data set, we derived a subgraph AmazonlOOO with 1000 nodes, and synthesized a 
weakly connected directed graph Amazon 5 00-full from the original graph. 

The Roget Data Set. There are 1,022 nodes and 5,075 edges in the Roget data set [3]. Also, we 
synthesized a weakly connected directed graph Roget-full from the original graph, by adding some edges 
to it. 
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Figure 10: An overview of the Dow Jones Index from January, 2008 to June, 2008 

The Stock Data Set. We also generated a Stock network data set. Specifically, we collected daily stock 
prices from Wharton Research Data Services [1] of 3,081 instruments in the U.S. stock market over a 
period of 125 consecutive trading days from Jan 2, 2008 to Jun 30, 2008. We tried to avoid selecting the 
period with a strong movement trend in the stock market, since the movements of all instruments during 
that period tend to have high correlations among each other. As can be seen in Figure 10, there was no 
strong trend in the Dow Jones index during the selected period. Then we removed instruments in Dow 
Jones and S&P 500 indexes from our collection. Those instruments are more representative in the stock 
market and therefore tend to have high correlations with the other instruments. Since we target on finding 



out some not-so-obvious blackhole patterns, we only consider instruments not in Dow Jones and S&P 500 
indexes. After that, we constructed the Stock data set as follows: 1) Nodes in this data set correspond to 
instruments. There are 2,453 nodes in this data set; 2) we build a vector P, = {pn, pa, ... , pu, ... , pms} ft)r 
each instrument, where pu is the closing price of instrument / on day t; 3) we create a Boolean vector 5, = 



bii24} based onP„ where bu = 1, ifpu+i ^ Pu, otherwise 0; 4) ForX = 



{xi, 



,x„ 



(bu, ba, ... , bii, 

x„} and Y = {yi, ... , y,, ... , y„}, Pxy(k) is the lagged correlation when Y is delayed by k. A symmetric 
situation can be applied to get Py/k). We compute the lagged correlations pi/l) and pj/l) for each pair of 
instrument / and j; 5) there is an edge from node j to node /, if pi/l) > 9, where 6* is a pre -specified 
threshold. Since we compute the lagged correlation of 1-day delay between two instruments, if there is an 
edge from node j to node /, it indicates that the movement of instrument j followed the movement of 
instrument / on the previous day. Here, we specify 9 as 0.35 to get the Stock-0.35 data set. 

Note that the method we used to construct the Stock data set is similar to the way in [4]. However, there 
are some differences. We used the lagged Pearson correlation among instruments, and ended up with a 
directed graph. While Boginski et al [4] employed the general Pearson correlation and constructed an 
undirected graph. 

Experimental Platform. All the experiments were performed on a Dell Optiplex 960 Desktop with 
Intel Core 2 Quad Processor Q9550 and 4 GB of memory running the Windows XP Professional Service 
Pack 3 operating system. 

5,2 An Overall Comparison 

In this subsection, we provide an overall comparison of Brute-Force, iBlackhole, and iBlackhole-DC 
algorithms. 

First, we compare the performances of three algorithms on different data sets with almost the same 
number of nodes. In this experiment, we choose data sets WikilOOO, AmazonlOOO, and Roget. Figure 11 
shows the running time of these algorithms. As can be seen, both Brute-Force and iBlackhole algorithms 
are runnable within certain number of nodes, while iBlackhole can go a litter bit fiirther than Brute-Force. 
In contrast, the iBlackhole-DC algorithm is runnable for finding H-node blackhole patterns with a large n 
value. 
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Figure 1 1 : Running time of Brute-Force, iBlackhole, and iBlackhole-DC on different data sets 

The running time of three algorithms for detecting blackhole patterns with different number of nodes 
forms three approximately straight lines in logarithm scale for all three data sets. (For iBlackhole-DC, it is 
more clear if we only focus on « ^4). This indicates that the running time for those algorithms follow an 
exponential increasing time. Also, the slopes of three performance curves for each data set are 
significantly different. For Brute-Force, since we do the exhaust search at the beginning and the number 
of nodes of the three data sets are almost the same, the slopes in those three subfigures are almost the 



same. For iBlackhole, as well as iBlackhole-DC, they are a little different. The slope of the curve on the 
WikilOOO data set is larger than slopes in Amazon 1000 and Roget. For both iBlackhole and iBlackhole- 
DC, we prune irrelevant nodes from each data set. However, the pruning effect depends on the graph 
properties of each data set (i.e. the average in-degree and out-degree plays an important role). This makes 
the running time of iBlackhole and iBlackhole-DC algorithms vary for different data sets, but after all, 
much less than the Brute -Force algorithm. 

Next, we compare the performances of three algorithms on the same data set with different number of 
nodes. In this experiment, we choose data sets WikiSOO, WikilOOO, and WikilSOO. Figure 12 shows the 
running time of these three algorithms on those three data sets. 

The overall performances of these three algorithms are very similar to the first experiment. However, 
there are still something interesting here. We can observe that the slopes of the three lines in these three 
data sets are almost the same. (For iBlackhole-DC, it is more clear if we only focus on « iS^ 4). Since 
these three subgraphs are derived from the same network, the inherent graph properties of these data sets 
are similar. The above might be the reason that similar slopes are observed in the results. 
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Figure 12: Running time of Brute-Force, iBlackhole, and iBlackhole-DC for different # nodes 



5,3 iBlackhole vs. iBlackhole-DC 

In this subsection, we compare the performances of iBlackhole and iBlackhole-DC algorithms. We 
show how significant the divide-and-conquer strategy improves the performance of iBlackhole. In this 
experiment, we choose three synthetically weakly connected directed networks, Amazon500-full, Roget- 
full, and Wikil500-full. 
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Figure 13: The running time of iBlackhole and iBlackhole-DC algorithms on different data sets 



Figure 13 shows the running time of these two algorithms on those three data sets. In the figure, we can 
see that the performance of iBlackhole-DC is several orders of magnitude faster than the performance of 
iBlackhole, since it drastically divides a large exponential growth search space into several much smaller 
exponential growth search space, and thus reduces a lot of computational cost. 

Figure 14 shows the visualizations of the structures of different data sets before and after applying the 
first pruning scheme to these data sets while detecting 7-node blackhole patterns. This figure is drawn 
with Pajek [17]. From this figure, we can observe that the number of nodes in each data set decreases 
dramatically after pruning, and each network becomes very sparse. Table 2 shows some main 
characteristics of the data sets after pruning. As can be seen, while the original data sets are all weakly 
connected, we can still get a large number of connected components after pruning. Therefore, the divide- 
and-conquer strategy can help dramatically reduce the search space. 
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Figure 14: Visualizations of structures of different data sets before and after pruning 



Table 2: Characteristics of data sets after pruning 



Data set 


# nodes 


# edges 


# connected comp 


AmazonSOO 


14 


36 


3 


Roget 


32 


37 


11 


WikilSOO 


252 


209 


43 



5,4 Blackhole Patterns in the Stock Data 

Here, we show an application of blackhole patterns for understanding the structural relationship of 
stock movement. 

Figure 15 shows two blackhole patterns identified in the Stock-0.35 data set. Owens Coming (ticker: 
OC) is in the left blackhole pattern. The Westmoreland Coal Company (ticker: WLB) has an outlink to 
OC. This indicates that the price movement of WLB followed the price movement of OC. By doing some 
research, we find out Owens Coming is one of the biggest building material producers in the country. Its 
products include the manufactured stone products used in the building. In recent years, there is a trend in 
the industry that companies are developing new innovative building materials by recycling the waste in 
the energy industry, which are primarily the residual byproducts of coal combustion. As an energy 
company, WLB owns five coal mines. Therefore, it is understandable that the stock price of the 



Westmoreland Coal Company has a lag correlation with the stock price of Owens Coming. The other two 
companies in this pattern are Venoco lnc.(ticker: VQ) and Helmerich & Payne Inc. (ticker: HP). Venoco 
Inc. [2] is an energy company primarily engaged in the acquisition, exploration, exploitation and 
development of oil and natural gas properties, while Helmerich & Payne Inc. [2] is a contract drilling 
company drilling oil and gas wells for others. Therefore, it is not surprising that the stock price 
movements of these two companies are lag correlated with the stock price of the Westmoreland Coal 
Company. Indeed, the blackhole patterns can help illustrate this type of structural relationships of stock 
movements of several companies. 
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Figure 15: Illustration: two blackhole patterns identified in the Stock-0.35 data set 

The second blackhole pattern is a star-shaped blackhole, which indicates the stock prices for the other 
six instruments are triggered by Citadel Broadcasting Corp. (ticker: CDL). Among these six companies, 
there are one telecommunication company (ADCT), three IC related companies (ISIL, TRID, and ZRAN), 
and one highly engineered steel produce company (TKR), which are all related to the Broadcasting 
Corporation to some extent. The other company is a wellness solution provider, which may be involved in 
this pattern by chance or for some unknown reasons. 

This application is just a simple indication of the use of the blackhole patterns. Indeed, the blackhole 
patterns can provide an unique view of some structural properties, and help us better understand the 
interactions among some nodes in the network. However, we should note that this use of blackhole 
patterns is still preliminary and more comprehensive studies are expected in the future. 



6. RELATEDWORK 

Related work can be grouped into two categories. The first category includes the work on frequent 
subgraph mining, which studies how to efficiently find frequent subgraphs in the graph data. For instance, 
Jiang et al. [11] proposed a measure for mining globally distributed frequent subgraphs in a single labeled 
graph. Meanwhile, there are many works in mining frequent subgraphs in multiple labeled graphs [20, 18, 
10, 19, 12, 6]. The problem of detecting blackhole patterns is different iirom the above works for two 
reasons. First, the definition of blackhole patterns is different from the definition of irequent subgraphs. 
Second, blackhole patterns are identified whether they are frequent or not. 

The second category includes the works for detecting community structures in large networks. 
Communities in a network are groups of nodes within which connections are dense, but between which 
connections are sparse [15]. There are a lot of works on how to detect communities in a network. For 
instance, Newman and Girvan [16, 8] proposed a betweenness-based method, Hopcroft [9] proposed a 
stable method, and Ghosh [7] proposed a global influence based method to detect community structures. 
All these methods detect community structures based on certain definitions and criteria. However, the 
definition of blackhole patterns is different irom the above definitions of communities. Also, once a 



network has been decided, the number of «-node blackhole patterns is determined. In contrast, it is 
usually difficult to know how many community structures are in the network. 



7. CONCLUDING REMARKS 

In this paper, we formulated a problem of finding blackhole and volcano patterns in directed networks. 
Both blackhole and volcano patterns can be observed in real-world scenarios, such as the trading ring for 
market manipulation. Indeed, it is essentially a combinatorial problem for mining blackhole or volcano 
patterns. To reduce the complexity of the problem, we first proved that the problem of finding blackhole 
patterns is a dual problem of finding volcano patterns. Thus, we could be only focused on mining 
blackhole patterns. To that end, we derived two pruning schemes. The first scheme is based on a set of 
size-independent pruning rules which can help to prune the candidate search space effectively and thus 
can dramatically reduce the computational cost of blackhole mining. Based on the first pruning scheme, 
we developed the iBlackhole algorithm for mining blackhole patterns. In addition, the second scheme is 
to take advantage of an unique graph property; that is, we could search in each individual subgraphs if the 
target directed graph contains several disconnected subgraphs. Therefore, by exploiting these two pruning 
schemes, we developed the iBlackhole -DC algorithm for finding blackhole patterns. 

Finally, as shown in the experimental results, the pruning effect of both pruning schemes is significant 
and the iBlackhole -DC algorithm is several order-of-magnitude faster than the iBlackhole algorithm, 
which outperforms a bruteforce approach by several orders of magnitude as well. 
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